Self-Driving Car Engineer Nanodegree

Deep Learning

Project: Build a Traffic Sign Recognition Classifier

In this notebook, a template is provided for you to implement your functionality in stages which is required to successfully complete this project. If additional code is required that cannot be included in the notebook, be sure that the Python code is successfully imported and included in your submission, if necessary. Sections that begin with 'Implementation' in the header indicate where you should begin your implementation for your project. Note that some sections of implementation are optional, and will be marked with 'Optional' in the header.

In addition to implementing code, there will be questions that you must answer which relate to the project and your implementation. Each section where you will answer a question is preceded by a 'Question' header. Carefully read each question and provide thorough answers in the following text boxes that begin with 'Answer:'. Your project submission will be evaluated based on your answers to each of the questions and the implementation you provide.

Note: Code and Markdown cells can be executed using the Shift + Enter keyboard shortcut. In addition, Markdown cells can be edited by typically double-clicking the cell to enter edit mode.


Step 1: Dataset Exploration

Visualize the German Traffic Signs Dataset. This is open ended, some suggestions include: plotting traffic signs images, plotting the count of each sign, etc. Be creative!

The pickled data is a dictionary with 4 key/value pairs:

  • features -> the images pixel values, (width, height, channels)
  • labels -> the label of the traffic sign
  • sizes -> the original width and height of the image, (width, height)
  • coords -> coordinates of a bounding box around the sign in the image, (x1, y1, x2, y2)
In [1]:
# Load pickled data
import numpy as np
import pickle
import tqdm
import skimage.transform
from sklearn.preprocessing import LabelBinarizer
%pylab inline

# TODO: fill this in based on where you saved the training and testing data
training_file = './../Downloads/train.p'
testing_file = './../Downloads/test.p'

with open(training_file, mode='rb') as f:
    train = pickle.load(f)
with open(testing_file, mode='rb') as f:
    test = pickle.load(f)
    
X_train, y_train = train['features'], train['labels']
X_test, y_test = test['features'], test['labels']
Populating the interactive namespace from numpy and matplotlib
In [2]:
n_train = X_train.shape[0]
n_test = X_test.shape[0]
image_shape = X_train[2].shape[0:]
n_classes = len(set(y_train))

print("Number of training examples =", n_train)
print("Number of testing examples =", n_test)
print("Image data shape =", image_shape)
print("Number of classes =", n_classes)
print(X_train.shape)
Number of training examples = 39209
Number of testing examples = 12630
Image data shape = (32, 32, 3)
Number of classes = 43
(39209, 32, 32, 3)
In [3]:
import pandas as pd
df = pd.read_csv("signnames.csv")

def draw(X, y):
    for j in range(3):
        figure(figsize=(20,10))
        for i in range(5):
            subplot(151+i)        
            title(df.iloc[y[i+j*5]]['SignName'])
            imshow(X[i+j*5])
        show()

print("Look at Training")
draw(X_train, y_train)
print("Look at Testing")
draw(X_test, y_test)
Look at Training
Look at Testing

Step 2: Design and Test a Model Architecture

Design and implement a deep learning model that learns to recognize traffic signs. Train and test your model on the German Traffic Sign Dataset.

There are various aspects to consider when thinking about this problem:

  • Your model can be derived from a deep feedforward net or a deep convolutional network.
  • Play around preprocessing techniques (normalization, rgb to grayscale, etc)
  • Number of examples per label (some have more than others).
  • Generate fake data.

Here is an example of a published baseline model on this problem. It's not required to be familiar with the approach used in the paper but, it's good practice to try to read papers like these.

In [4]:
from skimage.exposure import equalize_hist

X_test = equalize_hist(X_test) #- 0.5
X_train = equalize_hist(X_train) #- 0.5

print("Look at Training")
draw(X_train, y_train)
print("Look at Testing")
draw(X_test, y_test)
Look at Training
Look at Testing

Implementation

Use the code cell (or multiple code cells, if necessary) to implement the first step of your project. Once you have completed your implementation and are satisfied with the results, be sure to thoroughly answer the questions that follow.

In [5]:
binarizer = LabelBinarizer().fit(y_train)
y_train = binarizer.transform(y_train).astype(np.float32)
y_test = binarizer.transform(y_test).astype(np.float32)

from sklearn.model_selection import train_test_split
from sklearn.utils import shuffle
X_train, y_train = shuffle(X_train, y_train, random_state=42)
X_v, X_t, y_v, y_t = train_test_split(X_test, y_test, test_size=0.5, random_state=42)
In [6]:
print("Look at Training")
draw(X_train, np.argmax(y_train, axis=1))
print("Much better...")
Look at Training
Much better...

Question 1

Describe the techniques used to preprocess the data.

Answer:

I did no do that much preprocessing:

  • Histogram equalization to help the network a little bit on the very dark low contrast images
  • Shuffle the data (well that is obvious)
  • LabelBinarizer to get to the 43 class vector
  • Train test splitting.

Specifically I did not center the images values around 0 as reccommended sometimes (but not always). I tried both but got very similar results

In [7]:
# Additional Data and iterator functions
# Note: inspired by keras.preprocessing.image

import scipy.ndimage as ndi
import itertools
def apply_transform(x, transform_matrix):
    x = np.rollaxis(x, 2, 0)
    final_affine_matrix = transform_matrix[:2, :2]
    final_offset = transform_matrix[:2, 2]
    channel_images = [ndi.interpolation.affine_transform(x_channel, final_affine_matrix,
                      final_offset, order=0, mode='nearest') for x_channel in x]
    
    x = np.stack(channel_images, axis=0)
    x = np.rollaxis(x, 0, 3)
    
    return x

def random_transform(x, rg,height_shift_range,width_shift_range,shear_range,zoom_range):
    
    h, w = x.shape[0], x.shape[1]
    
    theta = np.pi / 180 * np.random.uniform(-rg, rg)
    rotation_matrix = np.array([[np.cos(theta), -np.sin(theta), 0],
                                [np.sin(theta), np.cos(theta), 0],
                                [0, 0, 1]])
    
    tx = h * np.random.uniform(-height_shift_range, height_shift_range) 
    ty = w * np.random.uniform(-width_shift_range, width_shift_range) 
    translation_matrix = np.array([[1, 0, tx],[0, 1, ty],[0, 0, 1]])
    
    shear = np.random.uniform(-shear_range, shear_range) #0.5
    shear_matrix = np.array([[1, -np.sin(shear), 0],[0, np.cos(shear), 0],[0, 0, 1]])
    
    
    zx, zy = np.random.uniform(zoom_range[0], zoom_range[1], 2)
    zoom_matrix = np.array([[zx, 0, 0],[0, zy, 0],[0, 0, 1]])
        
        
    transform_matrix = np.dot(np.dot(np.dot(rotation_matrix, translation_matrix), shear_matrix), zoom_matrix) 
    transform_matrix = transform_matrix_offset_center(transform_matrix, h, w)
    
    x = apply_transform(x, transform_matrix)
    return x

def transform_matrix_offset_center(matrix, x, y):
    o_x = float(x) / 2 + 0.5
    o_y = float(y) / 2 + 0.5
    offset_matrix = np.array([[1, 0, o_x], [0, 1, o_y], [0, 0, 1]])
    reset_matrix = np.array([[1, 0, -o_x], [0, 1, -o_y], [0, 0, 1]])
    transform_matrix = np.dot(np.dot(offset_matrix, matrix), reset_matrix)
    return transform_matrix

def trans(X): 
    for element in itertools.cycle(X):
        yield random_transform(element, rg=0.2,height_shift_range=0.1,width_shift_range=0.1,shear_range=0.5,zoom_range=(0.9,1.1))

def forward(X):
    for element in itertools.cycle(X):
        yield element
        
Xtrain_iterator = trans(X_train)
ytrain_iterator = forward(y_train)



Xv_iterator = forward(X_v)
Xt_iterator = forward(X_t)
yv_iterator = forward(y_v)
yt_iterator = forward(y_t)

Question 2

Describe how you set up the training, validation and testing data for your model. If you generated additional data, why?

Answer:

I generated training data on the fly (setup a pipeline inspired by Keras' ImageDataGenerator (i also took their concept of using ndimage and matrix transforms instead of a real image library). This way the training loop does not see the exact same image (but instead the slightly jittered one). This was the biggest improvement i made to get to the 98% vs 95% I split the test data 50%/50% into validation and test

In [8]:
import tensorflow as tf
batch_size = 256
patch_size = 3
depth = 10
num_hidden = 512


tf_data = tf.placeholder(tf.float32, shape=(batch_size, 32, 32, 3))
tf_labels = tf.placeholder(tf.float32, shape=(batch_size, n_classes))
tf_keep = tf.placeholder(tf.float32)


# Variables.
conv1_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, 3, 32], stddev=0.1))
conv1_biases = tf.Variable(tf.zeros([depth])) # I dont use these anymore


conv2_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, 32, 32], stddev=0.1))
conv2_biases = tf.Variable(tf.constant(1.0, shape=[depth]))

conv3_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, 32, 64], stddev=0.1))
conv3_biases = tf.Variable(tf.zeros([depth]))


conv4_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, 64, 64], stddev=0.1))
conv4_biases = tf.Variable(tf.constant(1.0, shape=[depth]))


conv5_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, 64, 128], stddev=0.1))
conv5_biases = tf.Variable(tf.zeros([depth]))

conv6_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, 128, 128], stddev=0.1))
conv6_biases = tf.Variable(tf.zeros([depth]))




layer3_weights = tf.Variable(tf.truncated_normal([2048, num_hidden], stddev=0.1))
layer3_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))
layer4_weights = tf.Variable(tf.truncated_normal([num_hidden, n_classes], stddev=0.1))
layer4_biases = tf.Variable(tf.constant(1.0, shape=[n_classes]))


def model(data):
    #cov
    conv = tf.nn.conv2d(data, conv1_weights, [1, 1, 1, 1], padding='SAME')
    conv = tf.nn.relu(conv)
    conv = tf.nn.conv2d(conv, conv2_weights, [1, 1, 1, 1], padding='SAME')
    conv = tf.nn.max_pool(conv, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1],padding='SAME')

   
    
    conv = tf.nn.conv2d(conv, conv3_weights, [1, 1, 1, 1], padding='SAME')
    conv = tf.nn.relu(conv)
    conv = tf.nn.conv2d(conv, conv4_weights, [1, 1, 1, 1], padding='SAME')
    conv = tf.nn.max_pool(conv, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1],padding='SAME')
    
    conv = tf.nn.conv2d(conv, conv5_weights, [1, 1, 1, 1], padding='SAME')
    conv = tf.nn.relu(conv)
    conv = tf.nn.conv2d(conv, conv6_weights, [1, 1, 1, 1], padding='SAME')
    conv = tf.nn.max_pool(conv, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1],padding='SAME')
    
    
    

    shape = conv.get_shape().as_list()
    reshape = tf.reshape(conv, [shape[0], shape[1] * shape[2] * shape[3]])

    reshape = tf.nn.dropout(reshape, tf_keep)
    
    hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)
    hidden_drop = tf.nn.dropout(hidden, tf_keep)
    return tf.matmul(hidden_drop, layer4_weights) + layer4_biases
  

logits = model(tf_data)
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, tf_labels))
prediction = tf.nn.softmax(logits)

optimizer = tf.train.AdamOptimizer(0.001).minimize(loss)

Question 3

What does your final architecture look like? (Type of model, layers, sizes, connectivity, etc.) For reference on how to build a deep neural network using TensorFlow, see Deep Neural Network in TensorFlow from the classroom.

Answer:

My model is a 6 layer CNN with 3x3 patches and dropout layers every two CNN layers, increasing the depth from 3 to 32 to 64 to 128 and finally a dropout layer before a dense layer with 512 nodes another dropout layer and the output. This seems very standard... I tried a lot of more architectures but this got the best result. I would be very interested I better designs though! Please comment on how to get the 99% mentioned in some papers i tried some of those architectures in the papers but did not get good results (about 95%)

In [9]:
def accuracy(predictions, labels):
    """This could be implemented in tensorflow"""
    return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1)) / predictions.shape[0])

epoch = 100
num_steps = int(np.ceil(n_train / batch_size))
num_steps_val = int(np.ceil(n_test / batch_size / 2))
num_steps_test = int(np.ceil(n_test / batch_size / 2))
In [ ]:
with tf.Session() as session:
    tf.initialize_all_variables().run()
    for i in range(epoch):
        for b_size in tqdm.tqdm(range(num_steps)):
            #offset = (step * batch_size) % (y_train.shape[0] - batch_size)
            batch_data = np.array([Xtrain_iterator.__next__() for i in range(batch_size)])
            batch_labels = np.array([ytrain_iterator.__next__() for i in range(batch_size)])
            feed_dict = {tf_data : batch_data, tf_labels : batch_labels, tf_keep:0.5}
            _, l, predictions = session.run([optimizer, loss, prediction], feed_dict=feed_dict)
        for b_size in tqdm.tqdm(range(num_steps_val)):
            #offset = (step * batch_size) % (y_train.shape[0] - batch_size)
            batch_data = np.array([Xv_iterator.__next__() for i in range(batch_size)])
            batch_labels = np.array([yv_iterator.__next__() for i in range(batch_size)])
            feed_dict = {tf_data : batch_data, tf_labels : batch_labels, tf_keep:1.0}
            l2, predictions = session.run([loss, prediction], feed_dict=feed_dict)
        
        print(l, l2)

    #print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(feed_dict={tf_keep:1.0}), y_test_onehot[128*10:128*20]))
    
    saver = tf.train.Saver()
    save_path = saver.save(session, "./model-new.ckpt")
 77%|███████▋  | 119/154 [01:49<00:32,  1.08it/s]

Note: the output here is messed up. I don't want to run this again without a good reason because it took a whole night. The loop feedback could be improved by averaging the loss over all the batches but naturally that does not change the outcome. Next lets load the model and check the test set... it shows the loss and acc. of the last batch in an epoch and in the validation set

In [10]:
loss_array = []
acc_array = []
with tf.Session() as sess:
    saver = tf.train.Saver()
    saver.restore(sess, "./model-new.ckpt")
    print("Model restored.")
    for i in range(num_steps_test):
        batch_data = np.array([Xt_iterator.__next__() for i in range(batch_size)])
        batch_labels = np.array([yt_iterator.__next__() for i in range(batch_size)])
        feed_dict = {tf_data : batch_data, tf_labels : batch_labels, tf_keep:1.0}
        l2, predictions = sess.run([loss, prediction], feed_dict=feed_dict)
        loss_array.append(l2)
        acc_array.append(accuracy(predictions, batch_labels))
print("Accuracy %")
print(np.mean(acc_array))
print("Loss")
print(np.mean(loss_array))
print("")
Model restored.
Accuracy %
98.46875
Loss
0.0909412

Question 4

How did you train your model? (Type of optimizer, batch size, epochs, hyperparameters, etc.)

Answer:

I used

  • Adam optimizer (seems the de-facto standard)
  • 256 batch
  • 100 epochs (not exact it leaks a bit because the last set is not quite 256 so i start over again early)
  • dropout rate of 50%

These parameters are not rigorously optimized but more a this works approach

Question 5

What approach did you take in coming up with a solution to this problem?

Answer:

Since it is an image recognition problem I started with a basic CNN (like the mnist one in the tensorflow tutorial on the tensorflow website). Then increased the size and added more dropout but that limited me to 94% or so so i added image jittering.

I also tried the techniques used in the Sermanet Paper but was not successful with those at all.


Step 3: Test a Model on New Images

Take several pictures of traffic signs that you find on the web or around you (at least five), and run them through your classifier on your computer to produce example results. The classifier might not recognize some local signs but it could prove interesting nonetheless.

You may find signnames.csv useful as it contains mappings from the class id (integer) to the actual sign name.

In [11]:
import os
import skimage.data
from skimage.transform import resize

newdata = np.array([resize(skimage.data.imread("./collect-japan/"+name), 
                  output_shape=(32,32,3)) for name in os.listdir("./collect-japan/")])

Implementation

Use the code cell (or multiple code cells, if necessary) to implement the first step of your project. Once you have completed your implementation and are satisfied with the results, be sure to thoroughly answer the questions that follow.

In [12]:
newdata = equalize_hist(newdata)
newdata_it = forward(newdata)

with tf.Session() as sess:
    saver = tf.train.Saver()
    saver.restore(sess, "./model-new.ckpt")
    print("Model restored.")
    for i in range(1):
        batch_data = np.array([newdata_it.__next__() for i in range(batch_size)])
        feed_dict = {tf_data : batch_data, tf_keep:1.0}
        predictions = sess.run([prediction], feed_dict=feed_dict)
Model restored.

Question 6

Choose five candidate images of traffic signs and provide them in the report. Are there any particular qualities of the image(s) that might make classification difficult? It would be helpful to plot the images in the notebook.

Answer:

I took a few more... one challenge is that they are Japanese (local) and that some are not in the set (no parking) so perfect results cannot be expected but i want to see what the network says anyway. Lets have a look:

In [15]:
p = (np.argmax(predictions[0], axis=1))
draw(newdata, p)

Question 7

Is your model able to perform equally well on captured pictures or a live camera stream when compared to testing on the dataset?

Answer:

Yes it did really well! It can even classify a Japanese Stop even though it has never seen that.

Its not able to classify the no parking correctly. But that because it surprisingly not in the train dataset... looking at the table below it is also quite confident in the wrong prediction. Not good but can only be fixed with more data. It confuses it with Keep right which is indeed somewhat similar and in bad lighting with speed limit and No vechiles (but those only with low certainty.

Other problems are also caused by the sign not being in the train set and therefore it cannot find anything... No trucks is seen as 99% Speed limit (100km/h).

In [16]:
for signnum in range(5*3):
    l = sorted([(i,df.iloc[j]['SignName']) for i,j in (zip(predictions[0][signnum],range(43)))],reverse=True)[:5]
    print("Sign %d" % (signnum+1))
    print("%\t Name")
    for p,sign in l:  
        if int(p*100)>0: 
            print("%d\t %s" % (int(p*100), sign))
Sign 1
%	 Name
100	 Speed limit (30km/h)
Sign 2
%	 Name
100	 Go straight or left
Sign 3
%	 Name
100	 Speed limit (30km/h)
Sign 4
%	 Name
100	 Stop
Sign 5
%	 Name
100	 Keep right
Sign 6
%	 Name
65	 Keep right
22	 Yield
3	 Speed limit (100km/h)
1	 Speed limit (80km/h)
1	 Go straight or right
Sign 7
%	 Name
60	 Turn right ahead
12	 Traffic signals
6	 End of no passing by vechiles over 3.5 metric tons
3	 Roundabout mandatory
2	 Yield
Sign 8
%	 Name
22	 Speed limit (120km/h)
21	 Speed limit (80km/h)
13	 Keep right
11	 Speed limit (50km/h)
6	 Speed limit (30km/h)
Sign 9
%	 Name
86	 No vechiles
9	 Speed limit (80km/h)
1	 General caution
Sign 10
%	 Name
100	 Stop
Sign 11
%	 Name
99	 Stop
Sign 12
%	 Name
99	 Speed limit (30km/h)
Sign 13
%	 Name
99	 Speed limit (100km/h)
Sign 14
%	 Name
52	 Keep right
20	 Keep left
13	 Go straight or right
8	 Speed limit (100km/h)
2	 Priority road
Sign 15
%	 Name
60	 Turn right ahead
12	 Traffic signals
6	 End of no passing by vechiles over 3.5 metric tons
3	 Roundabout mandatory
2	 Yield

Question 8

Use the model's softmax probabilities to visualize the certainty of its predictions, tf.nn.top_k could prove helpful here. Which predictions is the model certain of? Uncertain? If the model was incorrect in its initial prediction, does the correct prediction appear in the top k? (k should be 5 at most)

Answer: See above although not with top_k.

  • Sign 15 is uncertain but correct (probably due to bad lighting)
  • Sign 14 is uncertain because this sign is not in the train set and bad lighting
  • Sign 6,8,9 is uncertain because this sign is not in the train set and bad lighting
  • 7 is uncertain because bad lighting

Everything else is quite confident Conclusion: more data in bad lightning might help

Question 9

If necessary, provide documentation for how an interface was built for your model to load and classify newly-acquired images.

Answer: I only took some pictures with my phone and selected the part with the sign manually. Then i load and resize the images and then they run through the same queue as the dataset.

Damn, I want to try to run it on my phone directly but the camera in my old android phone is broken...

Note: Once you have completed all of the code implementations and successfully answered each question above, you may finalize your work by exporting the iPython Notebook as an HTML document. You can do this by using the menu above and navigating to \n", "File -> Download as -> HTML (.html). Include the finished document along with this notebook as your submission.